Binary coding of speech spectrograms using a deep auto-encoder
نویسندگان
چکیده
This paper reports our recent exploration of the layer-by-layer learning strategy for training a multi-layer generative model of patches of speech spectrograms. The top layer of the generative model learns binary codes that can be used for efficient compression of speech and could also be used for scalable speech recognition or rapid speech content retrieval. Each layer of the generative model is fully connected to the layer below and the weights on these connections are pretrained efficiently by using the contrastive divergence approximation to the log likelihood gradient. After layer-bylayer pre-training we “unroll” the generative model to form a deep auto-encoder, whose parameters are then fine-tuned using back-propagation. To reconstruct the full-length speech spectrogram, individual spectrogram segments predicted by their respective binary codes are combined using an overlapand-add method. Experimental results on speech spectrogram coding demonstrate that the binary codes produce a logspectral distortion that is approximately 2 dB lower than a subband vector quantization technique over the entire frequency range of wide-band speech.
منابع مشابه
The Diagnosis of Brucellosis in Rafsanjan City Using Deep Auto-Encoder Neural Networks
Introduction: Brucellosis is considered as one of the most important common infectious diseases between humans and animals. Considering the endemic nature of brucellosis and the existence of numerous reports of human and animal cases of brucellosis in Iran, the incidence of human brucellosis in Rafsanjan city was determined in the last 3 years (2016–2018). The main objective of this study was t...
متن کاملThe Diagnosis of Brucellosis in Rafsanjan City Using Deep Auto-Encoder Neural Networks
Introduction: Brucellosis is considered as one of the most important common infectious diseases between humans and animals. Considering the endemic nature of brucellosis and the existence of numerous reports of human and animal cases of brucellosis in Iran, the incidence of human brucellosis in Rafsanjan city was determined in the last 3 years (2016–2018). The main objective of this study was t...
متن کاملDeep Denoising Auto-encoder for Statistical Speech Synthesis
This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis. The technique allows us to automatically extract low-dimensional features from high dimensional spectral features in a non-linear, data-driven, unsupervised way. We compared the new stochastic feature extractor with conventional mel-cepstral analysis in analysis-by-synthesis and...
متن کاملUpdating the silent speech challenge benchmark with deep learning
The 2010 Silent Speech Challenge benchmark is updated with new results obtained in a Deep Learning strategy, using the same input features and decoding strategy as in the original article. A Word Error Rate of 6.4% is obtained, compared to the published value of 17.4%. Additional results comparing new auto-encoder-based features with the original features at reduced dimensionality, as well as d...
متن کاملUsing an autoencoder with deformable templates to discover features for automated speech recognition
In this paper we show how we can discover non-linear features of frames of spectrograms using a novel autoencoder. The autoencoder uses a neural network encoder that predicts how a set of prototypes called templates need to be transformed to reconstruct the data, and a decoder that is a function that performs this operation of transforming prototypes and reconstructing the input. We demonstrate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010